Tag
1 article
The ARC-AGI-3 benchmark challenges AI systems to match untrained human performance in interactive environments, with no frontier model achieving more than 1% success. The test strips away AI's typical advantages, exposing a gap in reasoning and adaptability.